Optimal Time and Space Construction of Suffix Arrays and LCP Arrays for Integer Alphabets
نویسنده
چکیده
Suffix arrays and LCP arrays are one of the most fundamental data structures widely used for various kinds of string processing. Many problems can be solved efficiently by using suffix arrays, or a pair of suffix arrays and LCP arrays. In this paper, we consider two problems for a string of length N , the characters of which are represented as integers in [1, . . . , σ] for 1 ≤ σ ≤ N ; the string contains σ distinct characters, (1) construction of the suffix array and (2) simultaneous construction of both the suffix array and the LCP array. In the word RAM model, we propose algorithms to solve both the problems in O(N) time using O(1) extra words, which are optimal in time and space. Extra words mean the required space except for the space of the input string and output suffix array and LCP array. Our contribution improves the previous most efficient algorithm that runs in O(N) time using σ+O(1) extra words for the suffix array construction proposed by [Nong, TOIS 2013], and it improves the previous most efficient solution that runs in O(N) time using σ+O(1) extra words for both suffix array and LCP array construction using the combination of [Nong, TOIS 2013] and [Manzini, SWAT 2004]. Another optimal time and space algorithm to construct the suffix array was proposed by [Li et al., arXiv 2016] very recently and independently. Our algorithm is simpler than theirs, and it allows us to solve the second problem in optimal time and space.
منابع مشابه
Linear-Time Construction of Suffix Arrays
The time complexity of suffix tree construction has been shown to be equivalent to that of sorting: O(n) for a constant-size alphabet or an integer alphabet and O(n logn) for a general alphabet. However, previous algorithms for constructing suffix arrays have the time complexity of O(n logn) even for a constant-size alphabet. In this paper we present a linear-time algorithm to construct suffix ...
متن کاملEngineering External Memory LCP Array Construction: Parallel, In-Place and Large Alphabet
The suffix array augmented with the LCP array is perhaps the most important data structure in modern string processing. There has been a lot of recent research activity on constructing these arrays in external memory. In this paper, we engineer the two fastest LCP array construction algorithms (ESA 2016) and improve them in three ways. First, we speed up the algorithms by up to a factor of two ...
متن کاملThe Virtual Suffix Tree: An Efficient Data Structure for Suffix Trees and Suffix Arrays
We introduce the VST (virtual suffix tree), an efficient data structure for suffix trees and suffix arrays. Starting from the suffix array, we construct the suffix tree, from which we derive the virtual suffix tree. The VST provides the same functionality as the suffix tree, including suffix links, but at a much smaller space requirement. It has the same linear time construction even for large ...
متن کاملOptimal In-Place Suffix Sorting
The suffix array is a fundamental data structure for many applications that involve string searching and data compression. Designing time/space-efficient suffix array construction algorithms has attracted significant attentions and considerable advances have been made in the last 20 years. We obtain the suffix array construction algorithms that are optimal both in time and space for both intege...
متن کاملAdvanced topics in algorithms
Lowest common ancestor algorithms are in [12, 19, 2]. Algorithms to construct suffix trees in linear time are in [22, 18, 21, 5]. Suffix arrays were introduced in [17]. The linear time construction algorithm for suffix arrays is from [14]. The simple construction of the LCP array from the suffix array is from [15]. The k-mismatch problem is discussed in [7, 16, 1]. The FM index is from [6]. Som...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1703.01009 شماره
صفحات -
تاریخ انتشار 2017